ACCELERATOR ENGINE FOR PROCESSING FUNCTIONS USED IN AUDIO 

ALGORITHMS 

BACKGROUND OF THE INVENTION 

1. Technical Field 

This invention relates generally to audio digital signal processing (DSP) and more 
particularly to processing functions used. in audio algorithms. 

2. Discussion of Background Art 

An Inverse Discrete Cosine Transformation (IDCT), which transforms data from 
the frequency domain to the time domain, requires a pre-multiplication, an inverse Fast 
Fourier Transformation (IFFT), and a post-multiplication. IDCT is used as one of the last 
stages of the Dolby®'s third generation audio coding (AC3) decompressing process. 

Performing a 128-point IFFT requires 64*7 (=448) radix 2 butterflies, each 
defined by 

A' = A-BCandS' = A + 5C, 
where A,A', B,5' and C are complex numbers in the form of D=J r +j</;. A subscript r 
denotes the real part, and a subscript i denotes the imaginary part of the complex number. 

FIG. 1 shows an audio integrated circuit chip 100 that includes a DSP 102, a 
Random Access Memory (RAM) 106, a Read Only Memory (ROM) 1 10, and an 
Input/Output (I/O) interface 1 14. DSP 102 provides the main digital signal processing for 
chip 100 including, for example, filtering and transforming audio samples. RAM 106 is a 
"memory on chip" used by programs running on DSP 102 to store data relating to input 
audio samples and the processing performed on those samples. ROM 1 10 stores 

I 



additional data for DSP 102. I/O interface 1 14 implements various protocols for 
exchanging, via databus 4005, audio samples with external devices such as analog-to- 
digital (A/D) and digital-to-analog (D/A) converters, etc. 

As audio AC3 and surround sound features are added to chip 100, DSP 102 
cannot perform as fast as desired, that is, it cannot execute as many million instructions 
per second (MIPs) as are required to do all of the tasks demanded by chip 100, including, 
for example, receiving AC3 data, detecting AC3 data error, using IDCT to decode data, 
and performing audio enhancement and 3D features. One approach to improving DSP 
102 performance, or to conserving DSP 102 MIPs for other desired functions, accelerates 
the AC3 decompressing stage. However, this approach requires operations that are tightly 
coupled with the AC3 algorithm, which in turn requires that a designer be intimately 
familiar with the AC3 decoding process. Further, the approach is useful for accelerating 
AC3, but provides no benefits for other audio functions or Moving Picture Expert Group 
(MPEG) functions. 

Therefore, what is needed is a mechanism to efficiently improve performance of 
DSP 102, and thereby performance of chip 100. 



SUMMARY OF THE INVENTION 
The present invention provides an accelerator engine running in parallel with a 
DSP in an audio chip to improve its performance. The engine avoids AC3-specific 
5 algorithms and focuses on general purpose DSP functions, including biquad filtering and 
IDCT, that comprise pre-multiplication, IFFT, and post-multiplication. The DSP is 
therefore free to do other processing while the engine is performing a requested function. 
The engine utilizes its resources in a pipeline structure for increased efficiency. In the 

q -■ biquad and double precision biquad modes the engine stores data in predefined locations in 

'O ■ 

lij 10 memory to efficiently access the data. The engine also uses an equation and data values 

ill 

|g stored in the predefined locations to calculate an audio sample. The calculated result is 

,g then stored in a memory location in a way that it can easily be used to calculate the next 

□ sample. In a preferred embodiment the engine efficiently saves 15 MIPs in an AC3 based 

jTj 3D audio product, including 7.5 MIPs in the AC3 decoding process by accelerating the 

;=15 IFFT, and 7.5 MIPs from the 3D processing via a biquad filtering function. 
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BRIEF DESC RIPTION OF THE DRAWINGS 



FIG. 1 shows a prior art DSP chip; 

FIG. 2 shows a chip utilizing the accelerator engine according to the invention; 
FIG. 3 is a block diagram of the accelerator engine; 
5 FIG. 4A shows a pipeline structure for the pre- multiplication mode; 

FIG. 4B shows a resource utilization map for the pre-multiplication mode; 
FIG. 5 A shows a pipeline structure for the FFT mode; 
FIG. 5B shows a resource utilization map for the FFT mode; 



FIG. 5C shows C code describing the address generation for both 128-point and 64-point 



FIG. 6A shows a pipeline structure for the biquad filtering mode; 

FIG. 6B shows a resource utilization map for the biquad filtering mode; 

FIG. 6C is a flowchart illustrating how the accelerator engine processes the biquad 



15 FIG. 6D is a flowchart illustrating how the accelerator engine stores data during the 
biquad filtering mode; 

FIG. 6E illustrates data in memory during a biquad filtering mode; 

FIG. 7A shows a pipeline structure for the double precision biquad filtering mode; 

FIG. 7B shows a resource utilization map for the double precision biquad filtering mode; 
20 FIG. 7C illustrates data in memory during a double precision biquad filtering mode; and 

FIG. 8 is a flowchart illustrating how a chip requests that a function be performed by the 




FFTs; 



filtering; 



accelerator engine. 
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DETAIL DESCRIP TION OF THE PREFERRED EMBODIMENT 

The present invention provides an accelerator engine running in parallel with a 
DSP for processing functions that are usable by audio algorithms including, for example, 
AC3, 3D, bass managements, MP3, etc., and that would otherwise be performed by the 
DSP. Consequently, the DSP is free to process other tasks. 

FIG. 2 shows a chip 250 utilizing the invention. Integrated circuit chip 250 is like 
chip 100 except that chip 250 includes an accelerator engine 200 interfacing via data bus 
2003 with DSP 102. Data passing through databus 2003 includes configuration 
information and data for accelerator engine 200 to perform a requested function. 

FIG. 3 is a block diagram of accelerator engine 200 preferably including an RRAM 
304, an IRAM 308, a ROM 312, a control register 316, a state machine 320, a multiplier 
(MPY) 326, a shift/sign extender 328, an ALU 330, and other components. In the 
preferred embodiment accelerator engine 200 supports the following functions: single 
biquad filtering; double precision biquad filtering; radix2, 7 passes, 128-point IFFT; 
radix2, 6 passes, 64-point IFFT, premultiplication 128- word and 64- word configurations, 
post-multiplication 128- word and 64- word configurations, IDCT 128- word and 64- word 
configurations, and BFE-RAM read/write. 

RRAM 304, depending on the required function, stores different types of values. 
For example, for a biquad filtering function, RRAM 304 stores filter coefficients. For 
IFFT and IDCT functions, RRAM 304 stores real pans and IRAM 308 stores imaginary 
parts of complex numbers required by these functions. Data in RRAM 304 and IRAM 
308, where appropriate, contains new values after each time accelerator engine 200 has 
calculated an audio sample. 



ROM 312 is used in IFFT and IDCT modes preferably to store both real and 
imaginary values for complex numbers required by IFFT and IDCT functions. 

State machine 320 generates addresses for RRAM 304, IRAM 308, and ROM 312 
and control signals for other circuitry of accelerator engine 200. 

MPY 326 is preferably a conventional 24-bit multiplier tor multiplying data on 
lines 3031 and 3035. 

Shifter/Sign-Extender 328 shifts its contents to adjust or mask a sign bit on lines 
3041 and 3045. 

ALU 330 is preferably a conventional ALU which accumules, adds, or subtracts 
data on lines 305 1 and 3055. ALU 330 can be divided into an ALUA (not shown) and an 
ALUB (not shown) for use when two ALUs are required, such as in the pre- 
multiplication, IFFT, and post-multiplication modes. 

Multiplexers (MUXes) M01, M03, M05, M07, M09, Ml 1, and M13 perform 
conventional functions of MUXes, that is, passing selected inputs based on select signals 
(not shown) provided by state machine 320 and appropriate circuitry. 

Latches L01, L03, L05, L07, L08, L09, Lll, L13, LIS, and L17 perform 
conventional functions of latches including passing inputs based on clocking signals (not 
shown) provided by state machine 320 and appropriate circuitry. 

This specification uses the following notations in several tables for a pipeline 
structure and for a resource utilization map. Each column represents a critical resource 
(RRAM 304, IRAM 308, MPY 326, etc.). Each row represents a phase-one-to-phase-one 
clock cycle, which lasts 20ns in a preferred embodiment. The tables show what each 
resource is doing during each cycle. In a resource utilization map, the number in each 




entry represents the number of operations a resource is executing during a given cycle. A 
"0" indicates a resource is idle; a "1" indicates a resource is busy. 

In the preferred embodiment the pre- and post-multiplication modes pass data and 
multiply each data item by a unique complex constant preferably stored in ROM 312, that 
5 is, 

A n = An C n V/htTQ 

An = a m + j a in and 

Cn = Cm + j Cin 

□ Parameters a m and a in represent data in RRAM 304 and IRAM 308, respectively, 

ly 

L£j 10 and C„ are filter coefficients in ROM 312. In the preferred embodiment, n ranges from 0 

!^ to 127 (for 128 data points). The pre- and post-multiplication modes are identical except 

; k that data is accessed linearly in the pre-multiplication mode and in bit-reverse order in the 

Q post- multiplication mode because the IFFT mode leaves its results in bit-reverse order. 

"z~~- 

j 8 ^ Coefficients are accessed linearly in both the pre- and the post-multiplication modes. 

S 15 FIG. 4A shows a pipeline structure for the pre-multiplication mode. In cycle 1 

accelerator engine 200 reads 6 r from RRAM 304, b x from IRAM 308, and c r from ROM 
312. In cycle 2 accelerator engine 200 reads d from ROM 312. In cycles 3 through 6 
MPY 326 performs b r * c r , b x * c„ b r * c„ and b\ * c r , respectively. Accelerator engine 
200, instead of using the same cv and a that were read from ROM 312 in cycles 1 and 2, 
20 rereads a and c r from ROM 3 12 in cycles 3 and 4. Rereading these values in cycles 3 and 
4 avoids using a register to store the values read in cycles 1 and 2. Further, ROM 312 is 
available for accessing its data in cycles 3 and 4. ALU A in cycles 6 and 7 performs A = b r 
* cr + 0 and Aq = A - (bi * c f ), respectively. ALUB in cycles 8 and 9 performs B = b r * c, 
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and Bo = B + {b; * c,), respectively. In cycle 10 accelerator engine 200 writes b r and b t 
into RRAM 304 and IRAM 308, respectively. In this pipeline structure, data required for 
a function in each cycle is made available before the data is needed. For example, b r and c r 
used by MPY 326 in cycle 3 have been made ready to MPY 326 by reading RRAM 304 
and IRAM 308 in cycle 1. 

FIG. 4B shows a resource utilization map for the pre-multiplication mode, which, 
starting on any four cycle boundary and taking four cycles, shows the number of 
operations a resource is executing. For example, MPY 326 and ROM 312 perform four 
operations (all four l's), one in each of the four selected cycles 1 to 4. ALUA performs 
two operations (one in each of cycles 2 and 3) while ALUB performs two operations (one 
in each of cycles 1 and 4). Data is accessed in cycles 1 and 2 for RRAM 304 and IRAM 
308. Therefore, MPY 326 and ROM 312 are utilized 100% of the time (4 operations in 
four cycles) while each of ALUA, ALUB, RRAM 304, and IRAM 308 is utilized 50% of 
the time (2 operations in 4 cycles). 

Accelerator engine 200 preferably uses seven passes of a radix 2 butterfly to 
process IFFTs, and employs the following equations: 

A' = A-BC and £' = A + BC, 
where A. A ', B, B '. and C are complex numbers. 

FIG. 5A shows a pipeline structure for the IFFT mode and FIG. 5B shows a 
resource utilization map for the IFFT mode. The explanations of this IFFT pipeline 
structure and resource utilization map are similar to the respective explanations of the pre- 
multiplication mode in FIGs. 4A and 4B. For example, in cycle 1 accelerator engine 200 
reads b r and c- from RRAM 304 and ROM 312 respectively; MPY 326 is utilized 100% of 



the time because in the selected four cycles MPY 326 performs four operations; etc. 
Consequently, as shown in FIG. 5B, all resources are utilized 100% of the time. 

Accelerator engine 200 performs each pass of the IFFT mode sequentially in both 
128-point IFFT and 64-point IFFT. The difference between each pass is the method by 
which data points are addressed. The addressing schemes for the two modes are similar. 
FIG. 5C lists C-code describing the address generation for both 128- point FFT and 64- 
point FFT modes. 

In the biquad filtering mode, accelerator engine 200 uses the equation: 
y n = box n + b&n-i + bzx n -i + a } y n .i + a 2 y n -i (1) 
where the subscript n indicates the current sample number; x n is the current input sample 
and y n is the current output sample; and b 0 , bu b 2 , a h and a 2 are filter coefficients. 

Accelerator engine 200 stores filter coefficients in RRAM 304 and input samples 
and filter states in IRAM 308. In the biquad filtering mode, 48-bit ALU 330 remains as 
one ALU (instead of being divided into two ALUs: ALUA and ALUB). 

FIG. 6A shows a pipeline structure for the biquad filtering mode. 

FIG. 6B shows a resource utilization map for the biquad filtering mode. This 
pipeline structure can be repeated every six cycles and all resources will be used five out 
of every six cycles. 

FIG. 6C is a flowchart illustrating a method for accelerator engine 200 to perform 
a biquad filtering in accordance with the invention. In step 604, accelerator engine 200 
receives, for example, 124 sample data points represented by xo to x n3 . In step 608, 
accelerator engine 200 stores data in IRAM 308. In step 612, accelerator engine 200 uses 
equation (I) to calculate y n for /i=0 to /i=123, that is, y 0 to ym, and store them in 



appropriate locations in IRAM 308. Accelerator engine 200 then continues to receive, 
store, and calculate sampled data in respective steps 604, 608, and 6 12 until all data has 
been received. Then accelerator engine 200 completes the biquad filtering function in step 
620. 

FIG. 6D is a flowchart illustrating how accelerator engine 200. in accordance with 
steps 608 and 612 of FIG. 6C, calculates and stores v„ in IRAM 308 locations for 124 
samples of x n from xo to x n3 . In step 604D accelerator engine 200 stores xo to x, 23 in 
locations 4 to location 127. Accelerator engine 200 also stores values of y. 2 ,y./, x. 2 , and x. 
i in locations 0, 1, 2, and 3, respectively. In this FIG. 6D, locations 0 to 127 are used for 
illustrative purpose only, any 128 locations, for example, 1 to 128, 2 to 129, or K to K + 
128 - 1 are applicable. In step 608D accelerator engine 200 uses the values in locations 0, 
1, 2. 3, and 4 to calculate y 0 . In step 612D accelerator engine 200 stores the value of y 0 in 
location 2. Accelerator engine 200 then returns to step 608D to calculate yi and store y, 
in location 3, which is one location higher than location 2 storing y 0 . Accelerator engine 
200 keeps calculating and storing values of y until accelerator engine 200 is done, that is, 
accelerator engine 200 calculates and stores values of y 2 to y n3 in location 4 through 
location 125, respectively. Accelerator engine 200, in calculating y 0 to y m , uses values of 
ci2, a,, b 2 , b u and bo preferably stored in ROM 312. Those skilled in the art will recognize 
that calculating y 0 («=0), based on equation (1), requires y. 2 , y./, x. 2> x. h and x 0 . The 
invention uses the zero value for each of y. 2 , y./, x. 2 , and x., to calculate the first sequence 
of 124 x„ samples. 

FIG. 6E illustrates how IRAM 308 stores a data value for each y„ from y 0 to ym. 
the "Address" column shows locations from 0 to 127 in IRAM 308. The "Initial Data" 
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column shows data ofy. 2 , y.,, x. 2 , x.,, and xn to x m in corresponding locations of the 
"Address" column. Columns n=0, n=l, n=2, n=3, ... to n=123, show data in IRAM 
308 locations for y„ for n=0 to n=123, respectively. Box 655 includes values (of y. 2 , y./, x. 
2, x.i, and x 0 ) that are used to calculate y 0 . Similarly, boxes 659, 663, and 669 include 
values (y./, y 0 , x. u etc.) that are used to calculate yi, y 2 , and y?, respectively. According to 
the invention, IRAM 308 locations of values in boxes 655, 659, 663 and 669, etc., are 
increased by one for each increment of n. For example, box 655 includes values in 
locations 0 through 4, box 659 includes values in locations 1 through 5, box 663 includes 
values in locations 2 through 6, and box 669 includes values in locations 3 through 7, etc. 
Arrow 602 indicates that y t is stored in iocation three, which is one location higher than 
the location of y 0 . Similarly, arrow 604 indicates that y 2 is stored in location four, one 
location higher than the location of y ; . Column n=123 shows that y 0 to y m are stored in • 
locations 2 to 125, respectively. Consequently, the invention, while writing the result of y n 
(e.g., y 0 in column n=0) over the oldest x value (e.g., x. 2 in the "Initial Data" column) 
permits the data for calculating y n+v (e.g., y,) to appear perfectly ordered in the subsequent 
five locations (e.g., location 1 through location 5). In accordance with the invention, 
calculating and storing y n for subsequent sequences of 124 samples of x, that is, 
calculating and storing y„ for n=124 to n=247. for n=248 to n=31\, and for n=372 to 
1=495. etc.. is similar to calculating and storing y n for n=0 to n= 123. The invention thus 
uses the same IRAM 308 locations from location 0 to location 127 for calculating and 
storing y n for subsequent sequences of 124 samples of x. As discussed above, the 
invention uses the zero value fory. 2) y./, x. 2 , and x., for the first sequence of 124 samples 
of x. For a second sequence of 124 samples of .r the invention uses y, 2 2, yns, x n2 , and x, 2 s 
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tor y. 2 , y.t, x.2* and x /f respectively. Similarly, for a third sequence of 124 samples of x the 
invention uses y 24 6, yur, x 24 s, and x 247 for y. 2j y. h x. 2t and x/, respectively. In calculating 
y«, accelerator engine 200 stores filter coefficients preferably in RRAM 304 in order of a 2 , 
ay, tb, bu and bo. 

The double precision biquad mode is similar to the (single) biquad mode, but the 
feedback state is stored and calculated as double precision for greater numerical stability. 
The equation for the double precision biquad mode is 

y n = boXn + biXn-i + b 2 x n - 2 + ajyln-i + a 2 yl n -2 + aiyhnA + a 2 yh n . 2 (2) 
in which yl n represents the lower half bits and yh» represents the upper half bits of y n . In 
the preferred embodiment, a double precision term y„ comprises 48 bits, and therefore 
each term yi n and yh n comprises 24 bits. 

FIG. 7A shows a pipeline structure for the double precision biquad mode. 

FIG. 7B shows a resource utilization map for the double precision biquad mode. In 
this mode accelerator engine 200 operates in a pipeline fashion that repeats every nine 
cycles. 

The method for calculating y n in the double precision mode is similar to that for 
calculating in the biquad mode except, instead of five values x n .i, x n . 2 , and y„. 2 ) 
accelerator engine 200 uses seven values (x„, x n . 2 , yLi, yUz, yh n .h and yh n . 2 ) as 
required by equation (2), to calculate each y„. Further, after calculating y 0l the invention 
stores yho and yl 0 in respective locations 2 and 4. Similarly, after calculating y i% the 
invention stores yh l and yh in respective locations 3 and 5, each location being one higher 
than the respective locations of yho and Wo. Thus for each y„, the invention stores yh n and 
yl n each in one location higher than the locations of respective yh n .i and yl n .i. 



12 



FIG. 7C shows how accelerator engine 200 stores calculating and calculated 
values in IRAM 308 for the double precision biquad mode. The "Address" column shows 
locations from 0 to 127 in IRAM 308. The "Initial Data" column shows data ofyh. 2l y/i./, 
yl-2. yih x. 2j x/, and x 0 to xm in corresponding locations of the "Address" column. 
Columns 0, 1, 2, 3, ... to 121 show data in IRAM 308 locations for x n and y n for n=0 to 
n=121, respectively. Box 755 includes values (of y/i. 2 , yh. h yi 2y yl u x. 2l jc./, and x 0 ) that 
are used to calculate y 0 . Similarly, boxes 759, 763, and 769 include values (y/i./, yh 0 , yL l% 
ylo, x/, etc.) that are used to calculate y h y 2 , and y 3 , respectively. According to the 
invention, IRAM 308 locations of values in boxes 759, 763, 769, etc., are increased by 
one for each increment of n. For example, box 755 includes values in locations 0 through 
6, box 759 includes values in locations 1 through 7, box 763 includes values in locations 2 
through 8, and box 769 includes values in locations 3 through 9, etc. Arrow 702 indicates 
that yhi is stored in location 3, which is one location higher than the location of yho. 
Arrow 703 indicates that yh is stored in location 5, which is one location higher than the 
location of y/ 0 . Similarly, arrows 704 and 705 indicate that yh 2 and yl 2 are each stored in 
locations one higher than the respective locations of yhi and y//, etc. Column n- 121 
shows that yho to yh m are stored in respective locations 2 to 123, while yl l2Q and ylm are 
stored in respective locations 124 and 125, and x i20 and x i21 are stored in respective 
locations 126 and 127. Consequently, the invention while writing the result of yh n and yl„ 
(e.g., yho and yl 0 in column az=0) over the oldest values of yl and jc (e.g., yi 2 and x. 2 in the 
"Initial Data" column) permits the data for calculating y rt+/ (e.g., yi) to appear perfectly 
ordered in the subsequent seven locations (e.g., location one through location seven). As 
in the biquad filtering mode, calculating and storing y n for subsequent sequences of 122 
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samples of x, that is, calculating and storing y n for *=122 to n=243, for n=244 to n=365, 
and for n=366 to m=487, etc., is similar to calculating and storing y n for n=0 to n= 122. 
The invention thus uses the same IRAM 308 locations from location 0 to location 127 for 
calculating and storing y n for subsequent sequences of 122 samples -of jc. As discussed 
5 above, the invention uses the zero value for yh. 2j yh.u yL 2i yiu and x; for the first 
sequence of 122 samples of jc. For a second sequence of 122 samples of x the invention 
uses yhno, yh i2 u ylno, ylnu xno, and X121 for yh. 2y yh.u yl. 2 , ylu x. 2y and jc./, respectively. 
Similarly, for a third sequence of 122 samples of jc the invention uses yh 242 , yh 24 3, ylm, 

□ yh43, xui, and x 24 3 for yh. 2y yh.u yl. 2y yiu x 2 , and jc./, respectively. In calculating y„, 

! {j 10 accelerator engine 200 stores filter coefficients preferably in RRAM 304 in order of a 2y au 
■g a 2 , au b 2j bu bo* 

:g FIG. 8 is a flowchart illustrating how chip 100 invokes a function performed by 

□ accelerator engine 200. In step 804 chip 100 writes to configuration register 316 to set 
the required mode and halt accelerator engine 200. In step 808 chip 100 downloads 

!^ 15 required data from chip 100 to accelerator engine 200. For the pre-multiplication, IFFT, 
or post-multiplication modes chip 100 downloads data preferably in the order of 0 to 127 
and alternating between RRAM 304 and IRAM 308. For the biquad mode, chip 100 
downloads coefficients preferably in the order of a 2 , au b 2 , bu and bo in RRAM 304. 
Similarly, for the double precision biquad mode, chip 100 downloads coefficients 
20 preferably in the order of a 2 , au a 2 , au b 2y bu and bo. Chip 100 also downloads data in the 
order from 0 to 127, which is the order shown in the "Initial Data" column in FIG. 6E. In 
step 812 chip 100 determines whether all of the data has been downloaded. If the data is 
not completely downloaded then chip 100 in step 808 keeps downloading data, but if data 
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is completely downloaded then chip 100 in step 816 sets the run bit in configuration 
register 3 16 so that accelerator engine 200 in step 820 can perform the requested function. 
Chip 100 in step 824 monitors the status of the done bit in configuration register 316 to 
determine whether accelerator engine 200 has completed its requested task. In step 828 
accelerator engine 200 completes its requested task, and, depending on the mode, chip 
100 may or may not set the done bit in configuration register 316. For example, if the 
requested task is a stand-alone pre-multiplication, then chip 100 sets the done bit, but if 
the task is an IDCT function then chip 100 does not set the done bit because accelerator 
engine 200 would continue to perform the IFFT function after completing the pre- 
multiplication function. In step 832 chip 100, via bus 2003 (FIG. 2), reads data from 
accelerator engine 200 in linear order except in the IFFT mode where IFFT functions 
leave data in bit-reverse order. 

The invention has been explained above with reference to a preferred embodiment. 
Other embodiments will be apparent to those skilled in the art after reading this disclosure. 
Therefore, these and other variations upon the preferred embodiment are intended to be 
covered by the appended claims. 
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